Cache Awareness in Blocking Techniques
نویسندگان
چکیده
To date, data locality optimizing algorithms mostly aim at providing strategies for blocking and reordering loops. But little research has been devoted to the nal step: nding the optimal block size, i.e., a block size that provides the best possible performance. Optimal block sizes are currently computed as if a cache is a local memory, i.e., cache interferences are ignored. Case-studies have already shown that cache interferences can greatly aaect the optimal block size value. The purpose of this article is to show that analytical modeling of cache interferences can be used to compute near-optimal block sizes for blocked loop nests. First, the method for evaluating cache interferences is presented. Second, the model is validated by correlating the estimated miss ratio with the simulated miss ratio and the execution time of various loop nests. Then, current techniques for computing the optimal block size are analytically and experimentally shown to yield below-optimal performance. Finally, current block size computation techniques are augmented with analytical modeling of cache interferences and TLB misses, and this new technique is shown to yield near-optimal performance and make blocking techniques safe. Reciprocally, it is also shown that even when no capacity miss occurs, nely tuned blocking techniques can be used to signiicantly reduce the number of cache interferences.
منابع مشابه
Contents II Cache Awareness in Blocking Techniques 76 8
To date, data locality optimizing algorithms mostly aim at providing e cient strategies for blocking and reordering loops. But little research has been devoted to the nal step, i.e., computing the optimal block size. Optimal block sizes are currently computed as if a cache behaves as a local memory, i.e., cache interference phenomena are ignored. Case-studies have already shown that cache inter...
متن کاملIn-Core Optimization of High-Order Stencil Computations
In this paper, we apply in-core optimization techniques to high-order stencil computations, including: (1) cache blocking for efficient L2 cache use; (2) register blocking and data-level parallelism via single-instruction multipledata (SIMD) techniques to increase L1 cache efficiency; and (3) software prefetching techniques. Our generic approach is tested with a kernel extracted from a 6 th -or...
متن کاملDRAFT: Polynomial Multiplication: Blocking to Improve Cache Performance
We search for techniques to decrease the multiplication time for large sparse polynomials in Lisp by speeding up the sequential accesses of large vectors. We do this by utilizing blocking to improve cache performance, which we show to be effective for sufficiently large problems.
متن کاملInnuence of Cross-interferences on Blocked Loops: a Case Study with Matrix-vector Multiply
State-of-the art data locality optimizing algorithms are targeted for local memories rather than for cache memories. Recent work on cache interferences seems to indicate that these phenomena can severely aaect blocked algorithms cache performance. Because of cache connicts, it is not possible to know the precise gain brought by blocking. It is even diicult to determine for which problem sizes b...
متن کاملUsing Cache as a Local Memory
Inability to reuse data, conflicting references, and underutilization of cache capacity are responsible for poor cache performance on various commonly used applications. Data prefetching, blocking, and data copying have been used to address these problems. These techniques, though effective, are directed towards solving one aspect of the overall problem. We propose a comprehensive solution to t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1998